SOLR-13309: Introduce FloatRangeField to expose Lucene 'FloatRange'#4229
SOLR-13309: Introduce FloatRangeField to expose Lucene 'FloatRange'#4229gerlowskija merged 4 commits intoapache:mainfrom
Conversation
This commit adds a new field type, FloatRangeField, that can be used to
hold singular or multi-dimensional (up to 4) ranges of floats.
FloatRangeField is compatible with the previously added
`{!numericRange}` and supports similar syntax.
| * Regex fragment matching a comma-separated list of signed floating-point numbers (integers or | ||
| * floating-point literals). | ||
| */ | ||
| protected static final String COMMA_DELIMITED_FP_NUMS = |
There was a problem hiding this comment.
[0] All of these regexes are a bit messy.
Ultimately the idea here is that we want the validation for each implementing type (int, float, etc.) to be specific to what that type looks like. IntRangeField needs to be able to reject fp-values, etc.
This validation is done by regex. The regexes live in this base class because some code here that invotes this verification. Individual sub-classes specify the regex pattern they want to use by using overrideable methods getRangePattern and getSingleBoundPattern.
So that's my rationale here. Open to other ways of doing it if folks can see a better approach....
There was a problem hiding this comment.
Your current approach is already really clean IMO, since it keeps validation in one place.
I did consider a more type agnostic two-step approach: use just one regex to validate the overall[... TO ...]structure to reduce some of the regex complexity, then let parseFloatArray in FloatRangeField.java reject invalid values downstream. But mismatched-dimension edge cases would need separate handling, so that ends up splitting the validation rather than really simplifying it, which also doesn't align with your rationale.
- One small cleanup idea: since
RANGE_PATTERN_STRandFP_RANGE_PATTERN_STRshare the same overall structure and mainly differ in the numeric regex fragment they embedded, would it be worth introducing a small helper likebuildRangePattern(String numericFragment)to centralise the range pattern assembly, and maybe even the compilation step as well? That might reduce the duplication a little.
HoustonPutman
left a comment
There was a problem hiding this comment.
In general looks great, the validation for floating point numbers could always be improved in the future.
solr/core/src/java/org/apache/solr/search/numericrange/NumericRangeQParserPlugin.java
Outdated
Show resolved
Hide resolved
| protected static final String COMMA_DELIMITED_FP_NUMS = | ||
| "-?\\d+(?:\\.\\d+)?(?:\\s*,\\s*-?\\d+(?:\\.\\d+)?)*"; | ||
|
|
||
| private static final String FP_RANGE_PATTERN_STR = |
There was a problem hiding this comment.
I noticed a small gap in the float validation regex: it doesn't accept scientific notation. I tested FP_RANGE_PATTERN_STR pattern on regex101.com, and an input like [1.1 TO 1.5e10] does not match.
Was the exclusion of scientific notation intentional? Or are values like this expected to be rejected before reaching this validation step?
There was a problem hiding this comment.
It was intentional, but I'm starting to second guess that decision now. My assumption at the time was that our existing float and double fields (e.g. DoublePointField) also didn't support scientific notation. There's no mention of that in the ref-guide afaict at least.
But in testing it just now I can see that they do actually support it.
Thanks for raising this - will fix.
…4229) Mirrors recently added 'IntRangeField' and 'LongRangeField' types.
Description
We recently added Solr field types "IntRangeField" and "LongRangeField" to allow users to index and search integer and long ranges (e.g. prices, business hours, grade ranges). But not everything is an int/long. Lucene has additional "range" types:
DoubleRange, andFloatRange.This PR tackles exposing
FloatRangein Solr asFloatRangeField. This is ideal for things that may not fit as a standard "whole number".Solution
This approach takes a very similar approach to that taken by #4192. AbstractNumericRangeFieldType contains most of the plumbing common to these "range" types, so the new implementation added by this PR
FloatRangeFieldcontains only that code that differs from its base/parent class.The main difference between FloatRangeField and the pre-existing IntRangeField and LongRangeField is the value parsing, which accepts floating point values that (e.g.
3.14) that would be rejected as bounds in an IntRangeField.Tests
FloatRangeFieldTest has unit tests for the new functionality, with more "integration" functionality tested in NumericRangeQParserPluginFloatTest. Both of these test classes borrow heavily from their int and long counterparts.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.